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I. INTRODUCTION 

The study of evolving graphs as a means to describe the power-law degree distribution of large networks 
has become increasingly relevant in recent years, starting with the study of the preferential attachment 
model of graph growth 1] that models a diverse range of man-made and natural networks. Graphs 
that grow by duplication of existing vertices Q, H, Q are particularly relevant to the study of biological 
networks, including protein-protein interaction networks and genetic regulatory networks, because they 
mimic the process of gene duplication by duplication of vertices, i.e, by creation of new vertices that 
have exactly the same set of connections as pre-existing vertices in the graph. Various processes of graph 
growth in which duplication forms one element of the growth dynamics have been shown to exhibit scale- 
free behavior at late times, characterized by a power-law dependence of the degree distribution p{k) of 
the graph, i.e., p{k) ~ k'^ , where 7 is the scaling exponent 5]. This has led to the notion that biological 
networks possess features in common with other well-studied, albeit disparate, networks, including the 
Internet and metabolic networks 0, • A particularly attractive feature of such scale- free networks is 
their putative robustness and tolerance of error 

At the same time, it is not so well-known that graphs that grow predominantly by the duplication pro- 
cess have features that are distinct from other scale-free graphs. These features become particularly stark 
and revealing in the limit of pure duplication growth. One such feature is the lack of the "self-averaging" 
property i.e., the property that an individual realization of graph growth does not asymptotically 
reach the degree distribution of an ensemble of such realizations. Specifically, it was shown 2] that the 
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number of distinct "orbits" (the subsets of nodes that are connected to exactly the same sets of nodes) 
remains invariant under any one reahzation of pure duphcation growth. Therefore the number of distinct 
degrees of the graph (where the degree of a node is defined as the number of its nearest neighbors) also 
remains invariant. This lack of "self-averaging" property may be formalised into an appropriate notion 
of lack of ergodicity in the graph dynamics. 

Another distinct feature of duplication graphs is the lack of clear emergence of an asymptotic (long- 
time) solution for the degree distribution of an ensemble of realizations [13 ■ While the dynamics of a 
single realization of the duplication process can be clarified 0] in terms of invariance in the number of 
orbits, the dynamics of an ensemble of such processes is quite non-trivial (because of lack of self-averaging) 
and is discussed below. In a model proposed earlier 0| that includes duplication as well as mutation by 
edge removal and addition, a breakdown of the asymptotic stationary solution is found to occur in the 
analysis. For a range of parameters in which duplication is the dominant process of graph growth (the 
duplication-dominated regime) , the analytically obtained stationary solution has negative average degree 
and the scaling exponent does not agree with that obtained from numerical simulations. Further analysis 
of the same model Q reveals that the degree distribution at late times depends sensitively on initial 
conditions, although the dependence itself is not clarified. 

One of the common threads in the analysis of duplication graphs is the assumed existence of a non- 
trivial, asymptotic, stationary degree distribution. While the scale-free preferential attachment model 
and other related models do have an asymptotic solution that is stationary, this is not generally true. For 
our purposes, we will define a stationary degree distribution to be a time-independent, non-zero degree 
distribution [ij. 

A number of questions naturally emerge from the above observations. Some of them are: 

(1) Do ensembles of duplication graphs (i.e., graphs in which the mechanism of growth is predominantly 
by means of duplication) have stationary asymptotic degree distributions? 

(2) Do ensembles of duplication graphs exhibit asymptotically scale-free behavior? 

(3) How does the asymptotic degree distribution depend on initial conditions? 

In this work, these questions are first answered in the context of pure duplication growth, where, as is 
shown below, an exact solution for the degree distribution at all times can be obtained analytically. This 
is followed by a discussion of these issues in mixed models that contain duplication as a component of the 
dynamics. It is conjectured that duplication-dominated growth may serve to define a new class of models 
that are asymptotically non-stationary (or, at best, quasi-stationary) but nevertheless may exhibit scale- 
free behavior. In spite of their lack of asymptotic stationarity, these models could well describe realistic 
biological networks. 



II. PURE DUPLICATION GROWTH 



Consider an undirected graph that grows by pure duplication. We will assume that the graph has mp 
vertices at time t = 0, and that time progresses in units of 1. At each time step, an existing vertex is 
picked at random and duplicated, i.e., a new vertex is added to the graph with the same set of edges 
as an existing vertex. The number of vertices therefore increases by one at each time step and the 
total number of vertices at time t is t + mo- Consequently, the maximum possible degree at time t 
is kjnax{t) = t + Too — 1. As shown earlier 0, any specific process of this type (i.e., a realization of 
this dynamics) leaves the number of orbits, and therefore, the number of distinct degrees in the graph, 
invariant. We will, however, consider the dynamics of an ensemble of such processes and denote the 
degree distribution of this ensemble by p(k, t) = the probability of finding a vertex of degree k at time t. 

Since every vertex has equal probability of being duplicated at a given time step, the probability 
Pncw{k, t) that a new vertex has degree k at time t is given by 

Pncw{k,t) ^ p{k,t - 1). (1) 

Furthermore, the probability Pndup(fc', t) that a vertex of degree k' is a neighbor of a duplicating vertex 
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is proportional to its degree. Demanding that a vertex of maximum degree is a neighbor of a duplicating 
vertex with probability 1 then gives 

P„dup(fc,t) = fc(mo + t-l)-^ (2) 

From the above we can find the number of vertices of degree k at time t as 

n(fe,t) = pncw{k,t){n{k,t - 1) + I) + (1 - pncw{k,t))n{k,t - 1) 

+Pndup(fc - 1, t)n{k - 1, t - 1) - _Pndup(fc, t)n{k, t - 1), (3) 

where the first two terms on the right-hand-side (RHS) of the above equation describe the contribution 
of the duplicating vertex itself and distinguish between the two cases: (a) the duplicating vertex is of 
degree fc, and (b) the duplicating vertex is not of degree fc; the third term comes from vertices of degree 
fc — 1 increasing their degree because they are neighbors of a duplicating vertex, and the fourth term is 
a loss term for vertices that were of degree k at the previous time and have since increased their degree 
due to neighbor duplication. 

Noting that n(fc, t) = p(fc, t){t + too)i one can derive the following master equation for p(fc, t): 

p{k, t) - p{k, t-l) = ^-^p{k - 1, i - 1) - -^—Pik, t-l). (4) 
t + niQ t + niQ 

The above equation holds for all A; > (with p{—l,t) — for all t). However, the dynamics of isolated 
vertices (vertices of degree 0) is decoupled from the dynamics of higher degree vertices. Indeed, one 
obtains p{0,t) is constant for all time and p{k,t) for fc > 1 does not depend on p{0,t). Because of 
this decoupling property, we will only consider solutions of Eq. Q for k > I, supplemented by the 
equation p(0,t) = p(0, 0). Correspondingly, we will only consider graphs with a minimum degree of 1, 
with the understanding that ensembles of graphs that contain isolated vertices can be subdivided into two 
ensembles, one ensemble of graphs whose minimum vertex degree is 1, and another ensemble of graphs 
that only contain isolated vertices. The dynamics of these two ensembles is then decoupled and we may 
only consider the non-trivial dynamics of the ensemble with minimum vertex degree of 1. 

By inspection of Eq. I^J), a naive solution is obtained. This is a "stationary" solution with scaling 
exponent 7 = — 1 satisfying kp{k) = (fc — l)p{k — 1), i.e., p(fc) ^ k^^. Note that this solution is not a 

global solution at any finite time because it is not correctly normalised: demanding ~ ^ 

causes the solution to be non-stationary, in which case it is not a solution at all. This solution can, at 
best, therefore be an asymptotic solution, and even so, hold only for finitely many values of fc, because 
the sum of fc^^ over infinitely many values of fc is divergent, and the normalization condition would fail 
to hold. Indeed, from an analysis of the exact degree distribution below, we find that this stationary 
solution is not an asymptotic solution at all, although the pure duplication growth limit in earlier analyses 
IE S yields a scaling exponent of — 1 . 



A. Exact degree distribution for pure duplication growth 

It turns out that the master equation I^J is simple enough to solve exactly in terms of the initial degree 
distribution p{k, 0). By writing out each term on the RHS of the master equation in terms of distributions 
at earlier times, one notices that p(fc, t) is a sum of terms of the general form 

n. ■ ^^ (fe - - 2) ■ ■ ■ (fc - i)(t + niQ - k){t ~ 1 + mp - fc) ■ ■ ■ (z + 1 + mp - fc) 
' (t + mo)(i + mo-l)...(mp + l) ' 
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where i runs from to t. Furthermore, there are tl/ (il{t — i)l) terms of this type. Putting aU this together, 
one obtains 



i=0 



{k ~ 1) ■ ■ ■ {k ~ i){t + mo - k) ■ ■ ■ {i + 1 + mo - k) 
{t + mo){t + mo — 1) ■ ■ ■ {mo + 1) 

Changing the dummy variable i to j — k — i and noting that, in the initial distribution, the minimum 
degree is 1 while the maximum degree is mp — 1, one finally obtains, after some simplification 

/ \ —1 min(/c,mo — 1) , \ / \ 

^ i=max(fc-ta) ^ / \ / 

In the sum above, it is understood that for values of k and t such that the lower limit of the sum is 
larger than the upper limit, p{k,t) = 0. The above solution corresponds to a mixture, via the initial 
distribution, of a hypergeometric distribution 12] and may be readily verified by direct substitution into 
Eq. gjl. 



B. Asymptotic analysis 

The exact degree distribution, Eq. (Q above, shows that, for t mo, there are three regimes of k 
values for which the degree distribution has potentially qualitatively different behavior. The first regime 
is 1 < fc < mo — 1, for which only terms from j = 1 up to j = fc contribute in the sum. The second regime 
is mo — l<k<t + l, for which the entire support of the initial degree distribution contributes to the 
sum (i.e., all terms from j = 1 to j = mo — 1). The third regime is i + 1 < fc < fcmax(i) = t + mo — 1, for 
which only terms from j = fc — i up to j mo — 1 contribute to the sum. At late times, the number of 
distinct fc values in the second regime {t + mo — 1 values) is much larger than the number of distinct fc 
values in the first and third regimes. We will therefore restrict our analysis to values of fc that correspond 
to the second regime. For the asymptotic analysis below, we will further assume that mo <C fc <tC t. 

In order to study the late-time behavior of the degree distribution, the asymptotic expansion of the 
Gamma function jTg] is used to obtain the following asymptotic results, valid for mo <^ k <^ t and 
1 < j < mo - 1: 

Wo — J J [mo — J)- 



k-l\ 
J-1 



(10) 



(— Y)i(i + o(fc-^))- 

Substituting these into Eq. Q, one obtains, for mo <C fc i, 

p{k,t) - ^ E ( 7-^1 ) (7) ^O'O) + . (11) 
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Since every successive term in the sum above is multiplied by an additional factor of k/t <^ 1, the 
dominant contribution to p{k, t) comes from the lowest non-zero value of j such that p{j, 0) ^ 0. This 
value of j is the lowest non-zero degree in the initial ensemble of graphs. Defining /cmin as the lowest 
non-zero degree in the initial distribution, we obtain the approximate asymptotic result 

p(fc,t)~^(;;;;°^~_\) (^^y"""^'p(fc.„,„,o) {i+o{k~')). (12) 

It follows that the asymptotic degree distribution approaches zero as t^'^""" for large t and is therefore 
non-stationary. However, for large, finite the following result is obtained. 

The asymptotic degree distribution for pure duplication graphs, although non-stationary, 
has a scaling exponent of 7 = /cmin — 1, where fcmin is the smallest non-zero 
degree in the initial graph. 

In particular, the scaling exponent is positive when /cniin > 1- [This behavior does not cause any 
normalization problems as i — > 00 because p{k, t) in this limit] . Figure 1 gives plots of the asymptotic 
degree distribution generated from numerical simulations of the master equation. For the case fcmin = 1, 
it is found that the degree distribution is uniform, while for fcmin = 2, p{k) has a linear dependence on fc, 
consistent with the above result. 

For realistic graphs, such as most biological networks of interest, it is usually the case that fcmin = 1- 
If these graphs evolved by means of a pure duplication process, the late-time degree distribution of an 
ensemble of such graphs would be dominated by a uniform distribution. We now examine the features of 
the asymptotic degree distribution that are amenable to a direct analysis of the master equation. 



C. Direct asymptotic analysis 

The asymptotic behavior of the degree distribution obtained so far relies on knowledge of the exact 
solution . It is of interest to know what features of the asymptotic degree distribution can be obtained 
directly from the master equation, without recourse to the exact solution. This is especially important in 
the analysis of more complex models, where the exact degree distribution for all time and for all values 
of fc may be analytically intractable. 

We first note that the lack of existence of a stationary asymptotic degree distribution may be deduced 
immediately from a generating function 'l4] approach to the problem. Assuming that a stationary 
asymptotic degree distribution exists, with p{k,t) = p{k,t — 1) = p{k) for all fc > 1, and defining the 
generating function 

00 

=^x^p(fc), (13) 

k=l 

one obtains from the master equation Q the following equation for (f>{x): 

x{x- 1)^ = 0, (14) 
ax 

which gives (l){x) — constant for < a; < 1. [The normalization condition 4>{l) = 1 then implies that 
the constant equals 1]. This is inconsistent with the fact that p{k) ^ for some fc > 1. Hence the 
assumption of a stationary distribution leads to a contradiction and therefore a stationary distribution 
cannot exist. 
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In order to analyse the non-stationary asymptotic distribution, one may assume that the asymptotic 
degree distribution is of the separable form: 

pik,t)^J29c{t)fc{k), (15) 

c 

where the possible values of c are to be determined. Since the master equation is a linear homogenous 
equation, one may further demand that every term in the above sum satisfies the master equation. [Note 
that the true asymptotic solution is indeed of the form H15|) .] A typical term in the sum above can 
then be substituted into the master equation Q. After some rearrangement of terms, one obtains 

Since the LHS of the above equation is a function of t alone and the RHS a function of k alone, each side 
must be separately constant, leading to the pair of equations 

Equation H17|l above gives rise to divergent growth in gc{t) if c < 0. A requirement is therefore c > (the 
c = case corresponds to a stationary solution, which has already been eliminated), a condition on the 
allowed values of c. With this condition, Eqs. (fT7|l and ((TH|) are readily solved to yield 

/n^ r(mo + l) r(t + mo-c+l) 
9cit) = 9c{0)— -TT— F77- —T-. ^ 19 

r(mo-c-t-i) r(< + 7710 + 1) 
Mk) = /,(i)r(2 - c) ^J}'^!^^^ - k^-' (20) 

Therefore, a power-law degree distribution with exponent 7 = c— 1 is obtained. From Eq. (|19|l . the lowest 
possible value of c will dominate the late-time behavior. Note that, although c is as yet undetermined, 
the analysis establishes the correct relationship between the exponent characterizing the rate at which 
the degree distribution falls to zero (— c) and the scaling exponent (c — 1). This is evident by comparison 
of Eqs. H19|) and H2(J|) to Eq. (|12|l . This relationship is a testable one. As shown in the next section, a 
similar relationship can be derived from the asymptotic analysis of a more complex model. 

In order to obtain the allowed values of the constant c by a direct asymptotic analysis, we resort to 
an eigenvalue method that is described in the Appendix. The method shows that the allowed values of c 
are the positive integers, c = 71, 77, = 1, 2, . . ., consistent with the exact solution 0. The lowest possible 
value of c is then c = 1, giving rise to a uniform degree distribution at late times. We are therefore able 
to capture most features of the exact solution by a direct asymptotic analysis, the missing feature being 
the relationship between the initial degree distribution and the lowest value of c. 



III. A DUPLICATION-MUTATION MODEL 

We now consider a more general, two-parameter growth model suggested earlier Q as a model for the 
evolutionary growth of the proteome that involves both duplication and mutation events. The model 
includes pure duplication growth as a special case. Assuming that the initial graph has ttiq nodes, the 
graph evolves according to the following rules: (i) a vertex is selected at random and duplicated, (ii) the 
links emanating from the newly generated vertex are removed with probability S, and (iii) new links are 
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created between the new vertex and all other vertices with probability (3 /{t + toq — 1) (where t + toq 
are the total number of vertices in the graph at time t) . The processes of link addition and removal are 
necessarily correlated. However, for (5 1, it is reasonable to approximate the evolution by uncorrelated 
addition and removal 4]. With this assumption, the master equation for p{k,t) is 

p{k,t)~p{k,t-l) - i^±^p{k + l,t-l)-'^±^p{k,t-l) 
t + rriQ t + mQ 

+ (21) 

t + mo 

Although the above equation describes the duplication- mutation process only for 6^1, the equation is 
still a valid master equation for all values of S and will be studied for all values of S first before focussing on 
the duplication-dominated regime (5^1. The equation that describes the duplication- mutation process 
for all values of 5, a further generalization of Eg. 121|) above, has also been derived Jj and its asymptotic 
behavior for S > 1/2 has been studied in detail Q- The eventual case of interest here is (5 < 1/2. Thus, 
Eq. (|21|l will be sufficient for our purposes. Note that the limiting case S = 0, f3 = corresponds to pure 
duplication growth. 



A. Condition for an asymptotically stationary degree distribution 

To obtain the condition for the existence of an asymptotic stationary distribution, we assume a sta- 
tionary normalizable distribution to begin with, proceed with the analysis, and search for a contradiction 
for some range of parameters. Indeed, setting p{k, t) = p{k, t — 1) = p{k) in Eq. H21|) . one obtains, 

(fc + l)Sp{k + l)-(k + 2(3)p{k) + ((1 - 5){k - 1) + 2/3) p{k - 1) = 0. (22) 

The corresponding generating function (j){x) is given by 

0(x) =^ a;*p(fc)- (23) 
fc 

Before analysing the equation satisfied by the generating function, it is important to note that, for p{k) to 
be a normalizable probability distribution, the series H23|) must converge in the limit x — *■ 1. Furthermore, 
4i{x) must be analytic at a; = 0. With this in mind, we turn to the equation satisfied by (j){x): 

((1 -6)x-5)'^ + 2/30 = 0. (24) 
ax 

The solution to the above equation is 

</>(x) = a5-^^'^^-'^ I 1 - x{8-^ - 1) (25) 

where a is an integration constant. The above expression is analytic at a: = and can be expanded in a 
Taylor series in powers of x to obtain the probabilities p{k). However, the Taylor series converges only if 

X I {5-^ - 1) < 1. (26) 

Demanding that the series converge as a; ^ 1 then yields the condition 5 > 1/2. For 6 < 1/2, the series 
is divergent ^15j , which contradicts the assumption of a stationary normalizable probablility distribution 
p{k). Therefore, we find that, for S < 1/2, the asymptotic distribution is not stationary 
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B. Asymptotic degree distribution for 5 < 1/2 

Since the asymptotic distribution is not stationary for S < 1/2, we may now consider solutions of H21|l of 
the separable form 

p{k,t)^ Uk)gS)- (27) 

As in the previous section, these solutions are labeled by the separation constant c. One then obtains 
the pair of equations 

(fc + l)(5/c(fc + l)-(fc + 2/3-c)/,(fc) + ((l-,5)(fc-l) + 2/?)/,(fc-l) = 0. (29) 

It is clear from Eq. (|28|l that one must have c > for gdt) to remain bounded as t ^ co. Since c = 
corresponds to the stationary case, we will restrict attention to c > 0. First, one finds from Eq. 



1 (t + m + 1)1 (m — c + 1) 

as t —^ oo. 

While the full asymptotic solution for fc{k) is difficult to obtain from Eq. ()29|l . we may carry out a 
Taylor expansion of fc{k + 1) and fc{k) for large values of k, i.e., 

/,(fc + l) ^ /c(fc) + ^ (31) 
Mk - 1) ^ ,Uk) - ^ (32) 

After substituting the above in Eq. I|29|) and solving the resulting first order differential equation, one 
finds 

^=1^-1' 
resulting in the asymptotic {t ~f oo) solution 

Again, the separation of variables analysis of the asymptotic degree distribution does not fix the allowed 
values of c. The eigenvalue method outlined in the Appendix, gives, to first order in S, 

c^2(3{l-S)+n{l-2S) + 0{S^), n = 0,1,2,... (35) 

Therefore, the late time solution will be dominated by the lowest value of c that is consistent with initial 
conditions. The lowest possible such value is c = —2/3(1 — 15). It should be emphasized, however, that the 
above range of values of c is only valid for (5 <C 1. 

Figure 2 displays a plot of the degree distribution when 5 = f3 = 0.1. In this case, the lowest possible 
value of c is c = 0.18, giving rise to an analytically predicted scaling exponent 7 ~ —0.775. Direct 
simulation of the master equation, shown in Fig. 2, gives approximate power law behavior with a scaling 
exponent of about —0.73, in reasonable agreement with the analytical result. 

It may be argued that duplication-dominated growth (S < 1/2) in this model is unrealistic because the 
mean degree {k)t grows without bound [HQ, whereas realistic, large, biological networks have small mean 
degree. This argument is, however, unfounded. For the duplication-mutation model, it has been shown 
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that (fc)f ^ t^^^^ for large t and for S < 1/2. Therefore, if S is less than but sufficiently close to 1/2 
the mean degree will grow very slowly and remain small even when the size of the graph is large. Thus, 
S < 1/2 could well be a viable region of parameter space, although, as shown here, the analysis of graph 
growth would require that the assumption of asymptotic stationary behavior be discarded. If the lowest 
allowed value of c does depend on initial conditions (as in the pure duplication case), large biological 
networks may contain important clues about the structure of such networks very early in evolution. 



IV. A MODEL WITH DUPLICATION AND PREFERENTIAL ATTACHMENT 



We now consider another two-parameter model of graph growth that also contains pure duplication 
growth as a special case but for which an asymptotic stationary distribution always exists everywhere in 
parameter space except at the point corresponding to pure duplication growth. It will be seen that, al- 
though an asymptotic stationary distribution exists, the actual degree distribution approaches its station- 
ary value very slowly in the duplication-dominated regime. Therefore, even at late times (corresponding 
to large graphs), the degree distribution is more accurately described by a quasi-stationary distribution (in 
a manner clarified below) rather than by the true asymptotic stationary distribution. This model there- 
fore serves to identify another possible feature of duplication-dominated growth, namely, quasi-stationary 
behavior, that may well hold in other, more realistic descriptions. 

The growth model is a combination of pure duplication growth, and growth by simple scale-free, 
preferential attachment IJ. We start with an initial graph at time t — with toq vertices. At each time 
step one of the following two processes can occur: 

(a) An arbitrary vertex in the graph is duplicated (all vertices have equal probability of duplication), as 
in the pure duplication growth model, or 

(b) A new vertex with m edges is added to the graph. These edges are preferentially attached to the 
high-degree vertices, i.e., the probability that an old vertex will be linked to the new one is proportional 
to its degree. 

We assume that process (a) occurs with probability pd and process (b) occurs with probability 1 — pd- 
The model therefore has two parameters, m and pd- The case pd = 1 corresponds to pure duplication 
growth, while the case Pd = corresponds to growth by preferential attachment alone. 

The master equation for such a growth model is a simple combination of the pure duplication and the 
scale-free preferential attachment master equations, 

p(fc, t) - p{k, t-l) = {(fc - l)pik - 1, i - 1) - kp{k, t - 1)} 



+ rT~^ l^-.m - p{k, t - 1) 



t + mo 



T71 

+ TTT {{k - l)p{k -l,t-l)- kp{k, t-1))}, (36) 



where 



=^fcp(fc,t-l) (37) 

k 

is the mean degree at time t — 1, and 5k,m is the Kronecker delta function. 



A. Existence of an asymptotic stationary distribution 

As before, we assume the existence of a stationary solution of Eq. H36I) and check whether the generating 
function (j){x) is analytic at x = and whether the series converges as a; — > 1. Assuming p{k,t — 1) = 
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Pd + I^^^i-^ ) ((fc - l)p{k - 1) - kp{k)) = {l-pa){p{k) - dk,m), (38) 



p{k, t) = p{k) in the limit t — > oo, one obtains for p{k) 

(fc) 

where 

(fc)oo = hm4^oo(fc)t- (39) 

The corresponding equation for the generating function (pix), for ^ 1, is 

x(l — x) d(j) 
fx dx 

where 



X™ = 0, (40) 



M--y^ + ^>0. (41) 

Equation H40|) can be solved to yield, after some simplification and a variable change, 

(j){x) = ~ x)" x"" dss^+'^-^il-xs)-"-^ +a(^^-^y , (42) 

where a is an integration constant. Note that the radius of convergence of the Taylor expansion of (l — x)^ 
is 1 and that the radius of convergence of the Taylor expansion of (1 — xs)^^~^ is 1/s > 1. Furthermore, 
every term in the Taylor expansion of (1 — xs)~^~^ can be integrated to give a finite result, provided 
m 7^ 0. Thus (t){x) is analytic at x = 0, provided a = and m ^ 0. We therefore set the integration 
constant a = 0. To show that the Taylor expansion converges at x = 1, it is not enough to know that 
the Taylor expansion about a; = has a radius of convergence of 1. We further need to show that the 
integral over s gives a finite result at a; = 1. 

In fact, the integral is divergent at a; = 1 for any /i > 0. However, the factor of (1 — a;)'' outside the 
integrand tends to as a; ^ 1. A more careful analysis is therefore required. To do this, we change 
variables from s to s' = (1 — xs)/{l — x) and rewrite 4>{x) in the form 

.(l-x)-i 

c^ix) = fix-f" / ds' (1 - .s'(l - x)f^"'~^ . (43) 



Setting a; = 1 in the above yields (/>(1) = 1, as required by normalization. 

We therefore find, for all < < 1 and m > 0, that the asymptotic distribution is stationary for this 
type of growth. However, to find the stationary distribution and the corresponding scaling exponent, we 
need to find (A:)oo, the asymptotic mean degree. 



B. Asymptotic mean degree 

The recursion equation for the evolution of the mean degree can be obtained by multiplying both sides 
of Eq. by k and summing over k. One obtains 

(fch = (fc)*-r(l + |^)+^^?^. (44) 
V t + mo J t + mo 
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For pd < 1/2, the above recusion gives rise to a finite asymptotic mean degree: 

{k)oc^ = —^—^, Pd<l2. 45 
1 - 2pd 

For pd > 1/2, the mean degree grows without bound as t ^ oo. To see this, we propagate Eq. (|44|l back 
to t ^ 0, giving 

^ r{t + mo + 2pd) L nmo + 1) 2m(l-p,)y + "^") 1 . (46) 

^ ^* r(t + mo + l) 1^ ^°r(mo + 2pd) ^ ^""^ r(i + mo + 2pd) J ^ ' 

The cases = 1/2 and pd > 1/2 are considered separately. For pd ~ 1/2, the above equation simphfies 
to give 

t 

(fc)* = (fc)o+™^(mo + i)-\ Pd-1/2. (47) 

1=1 

For large t, one obtains the asymptotic behavior 

mo 

(fc)t=mlni+(fc)o-TO^j-^+™C + 0(i-i), Pd = l/2, (48) 

where C is Euler's constant. Thus the mean degree for — 1/2 grows logarithmically to infinity as 
t oo. 

For Pd > 1/2 (the duplication-dominated regime in this model), the sum over i in Eq. H46|l can be 
explicitly performed by expressing the ratio of Gamma functions in the sum in terms of the Beta function. 
Using an integral representation of the Beta function [l^ , and interchanging the sum and the integral, 
one finds, 

^ T{t + mo + 2pd) r(mo + l) f 2m{l-pd) \ 2m{l ~ pg) 

^ r(t + mo + l) r{mo + 2pd)V^° 2pd ^ I j 2pd - I ^ ' 

^ r(mo + l) r 2m{l-pd)) _ 2m{l-pd) 

r(mo + 2pd) y 2pd-l j 2prf-l ' ' 

where the second equation above holds for large t. Again, one finds for pd > 1/2, that the mean degree 
grows without bound as a positive power of t for large t. 

We thus find that {k)oc = oo for pd > 1/2. Combining this result with the result 145|) for pd < 1/2, we 
obtain 

^i - 2(1 -Pd), Pd < 1/2, (51) 
= Pd' - 1, Pd> 1/2. (52) 



C. Asymptotic stationary distribution and quasi-stationary correction 

In order to obtain the asymptotic stationary distribution, we may directly solve the recursion of Eq. (|38|) 
for k > m. One finds 

( xr(TO + Ai + i) r(fc) r(m + ^ + i) 1 

pik) — phn) ; — ^ ^ p(m) — k , 53) 

' ' Vim) T{k + ^ + l) ' r(m) ' ^ ' 
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where the last expression holds for fc 3> to. A scale-free, stationary distribution therefore emerges, with 
scaling exponent 7 = — /i — 1, and ^ given by Eqs. H51|l and H52I) above. Note that this result breaks down 
in the pure duplication limit pd = 1^ because in this limit the asymptotic distribution is not stationary, 
as discussed earlier. 

Although the above result for the scaling exponent is correct for infinitely large graphs, the scaling 
exponent for large but finite graphs may not even agree approximately with the asymptotic scaling 
exponent. To see this, note that the asymptotic scaling exponent 7 = — /i— 1 was obtained by substituting 
the value of the mean degree at t = 00 into the definition of /i. For pd < 1/2 this mean degree is finite 
and it is expected that, as the graph grows, the mean degree will quickly approach its asymptotic value. 
However, in the duplication-dominated regime, pd > 1/2, the asymptotic mean degree is infinite and 
therefore never approached, even if the graph is large. A simple example is the case pd = 1/2, for which 
the mean degree grows logarithmically with the size of the graph and may therefore be small even for 
large, finite graphs. Therefore, for values of pd greater than or equal to but close to 1/2, it may be 
a better approximation to replace fi (and therefore 7) by its time-dependent value (obtained from the 
time-dependence of (fc)t). This corresponds to a quasi-stationary correction to the asymptotic stationary 
distribution, applied for large but finite graphs. 

Specifically, in the quasi-stationary regime, we have p{k,t) ^ k''^^^ with 7(i) = -^(i) - 1 and 

Mr^ = T^ + 7^, (54) 
1 - Pd {k)t 

where, for large but finite graphs, {k)t is given by Eq. H48(l for pd = 1/2 and by Eq. (|5()|l for pd > 1/2. 
The scaling exponent therefore slowly drifts towards its true asymptotic value as the graph grows larger. 

The effect of the quasi-stationary correction is studied in Figure 3 for the case pd = 1/2 and m = 6. 
The graph is grown to approximately 1000 vertices. In this case, the scaling exponent at i = 00 is 
7 = —2, while the quasi-stationary correction gives (fc)iooo — 31.07, IjL~^ 1.19, and a scaling exponent 
7 2± —1.84. This is in better agreement with the actual scaling exponent of about —1.8 obtained from 
the plot than the value —2. 



V. DISCUSSION 



The asymptotic degree distributions in three models for graph growth have been analysed in this 
article: growth by pure duplication, and two two-parameter models in which duplication forms one 
element of growth. While pure duplication growth may be an unrealistic mechanism for a number of 
reasons (including lack of ergodicity, linear growth of the mean degree with the size of the graph, etc.), 
it serves as a useful idealized test case for the study of qualitative features, such as asymptotic non- 
stationarity and sensitivity to initial conditions, that may be present in more complex, more realistic 
models. By analysis of the exact degree distribution in the pure duplication model, we find that the 
asymptotic degree distribution of an ensemble of graphs subject to pure duplication growth is indeed 
non-stationary but nevertheless exhibits power-law behavior with a non-negative exponent that depends 
on initial conditions in a simple way - the power law exponent is related to the lowest non-zero degree in 
the initial graph. The nature of the asymptotic degree distribution is also found from a direct asymptotic 
analysis of the master equation characterizing pure duplication growth, although such an analysis, being 
valid only in the asymptotic regime, does not relate the scaling exponent to initial conditions. 

The lack of existence of a stationary degree distribution is also found to occur in the duplication- 
dominated regime ((5 < 1/2) of the duplication- mutation model. For this model, 6 = 1/2 defines a critical 
boundary in parameter space that separates non-stationary and stationary asymptotic behavior. This 
also happens to be the critical boundary separating finite asymptotic mean degree and infinite asymptotic 
mean degree 01 • It is argued that, if S is less than but sufficiently close to 1/2, such a model could still 
describe realistic graphs, because the mean degree would increase very slowly with the size of the graph. 
The non-stationary asymptotic behavior of such duplication-dominated graphs could well depend on 
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initial conditions in a manner similar to the pure duplication case, via the lowest allowed value of the 
constant c that is consistent with initial conditions. 

For the model containing duplication growth combined with preferential attachment, an asymptotic 
stationary distribution is found to exist for all < 1. However, for the duplication-dominated regime, 
Pd > 1/2 (the critical boundary separating finite asymptotic mean degree and infinite asymptotic mean 
degree), the asymptotic degree distribution is more realistically described by a quasi-stationary distri- 
bution that takes into account the fact that the mean degree is always finite for large but finite graphs. 
Pd = 1/2 can then be interpreted as a critical boundary separating stationary and quasi-stationary asymp- 
totic degree distributions. On both sides of the critical boundary, the degree distribution has power-law 
behavior. 

These results suggest that duplication-dominated graph growth may serve to model a new class of large 
networks whose degree distributions, although displaying power-law behavior, are not well-approximated 
by stationary distributions, even when these networks have large size. Based on the models studied, we 
have found that at least two kinds of non-stationary asymptotic behavior can occur in such networks: 
(a) one in which the probabilities drift to zero while the scaling exponent remains invariant as long as 
the network is large enough (non-stationary behavior), and (b) one in which the probabilities eventually 
converge to a non-zero, power-law distribution but the scaling exponent drifts slowly to its asymptotic 
value (quasi-stationary behavior) . We also find that the scaling exponent will depend on initial conditions 
in both cases: in the non-stationary case, this dependence occurs via the allowed lowest value of the 
separation constant c, while in the quasi-stationary case, the scaling exponent depends on the mean 
degree in the initial graph, via Eq. (|54|l . Thus, duplication-dominated, scale- free networks may well 
contain early, and possibly identifiable, evolutionary remnants. 

We leave open to future work the question of the relationship, if any, between asymptotic stationarity 
of the degree distribution and ergodicity in the graph dynamics. 
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Appendix: An eigenvalue method for analysing the time dependence of the degree 
distribution 

Consider the duplication- mutation model of Section 3. At late times {t :S> mo), Eq. (PT|l can be expressed 
approximately as a differential equation in the time variable: 



dpjt) 
fi(lnt) 



= Ap(t), 



where p(t) is a t-dimensional vector representation of the degree distribution, 
p{t) = [p(0, t) p{l, t) p{2, t) . . . p{t — 1, t)], and the t xt matrix A is given by 



-2/3 5 
2/3 -(1 + 2/3) 
1-8 + 2(3 





25 

-(2 + 2/3) 





35 







2(1 -5) + 2/3 -(3 + 2/3) 4(5 



The general solution to Eq. H55|) is 



(55) 



(56) 



n=0 



(57) 



where {A„} are the eigenvalues of the matrix A and the time- independent vectors q*^"^ depend on the 
eigenvectors of A and on the initial degree distribution. Note that the above solution justifies the 
separation-of-variables assumption made in Sections 2 and 3. 

In order to obtain the time-dependence of the degree distribution, we are interested in the eigenvalue 
spectrum of A. While it is difficult to obtain the eigenvalues of A in general, it is quite straightforward 
to obtain them to leading order in 6. Indeed, when 5 = 0, the eigenvalue equation det(A — AI) = 

immediately yields the eigenvalues (denoted by Xn'^) 



AW = -n-2/3, n = 0,l,2. 
For (5 7^ 0, one finds, to first order in 5, 



(58) 



det(A-AI) = (-l)*i [| (n + 2/3 + A) 

t-l /n-2 \ / t-1 N 

- ^ n5 rQ(/ + 2/3 + A) (n - 1 + 2/3) ]J (? + 2/3 + A) 
+0(<5') 

By solving for the eigenvalues to first order in 5, one obtains 

A„ = -n{\ - 25) - 2/3(1 - 5) + 0{5'^). 



(59) 



(60) 



The above gives the allowed values of the constant c in Section 3, c = — A„. At late times, the degree 
distribution is dominated by the largest A„ (lowest c), obtained by setting n = 0, as Aq = — c = —2/3(1 — 5). 
The results of the pure duplication growth described in Section 2 may be obtained by setting 5 = 
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0,(3 = in the above and removing the first row and first column of the matrix A (corresponding to 
decouphng the dynamics of isolated vertices from non- isolated ones) . Removal of the first row and column 

is equivalent to discarding the eigenvalue A = 0. Denoting the remaining eigenvalues by An°'°'' , we then 
have 

X^y^ = -n,n= 1,2,... (61) 

The largest possible eigenvalue is then —1, resulting in c = 1 and a uniform degree distribution as argued 
in Section 2. 
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